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Abstract 

In this paper we calculate the average number of cliques in random scale- free net- 
works. We consider first the hidden variable ensemble and subsequently the Molloy 
Reed ensemble. In both cases we find that cliques, i.e. fully connected subgraphs, 
appear also when the average degree is finite. This is in contrast to what happens 
in Erdos and Renyi graphs in which diverging average degree is required to observe 
cliques of size c > 3. Moreover we show that in random scale-free networks the 
clique number, i.e. the size of the largest clique present in the network diverges with 
the system size. 
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When graphs are used to represent a variety of real technological, social and 
biological systems they are called networks. The analysis of many real net- 
works reveals that while different networks differ one from another in their 
local structure, characte rized by operational modules or moti fs that are a sig- 



nature of their function ()Milo2002l : IVazquez200 



have some importan t common characteristics ( Albert2002l : Dorogovtsev2003l : 



Dobrin2004^ . many networks 



Pastor-Satorras2004( ) . In particular a large variety of networks have been 
shown to display a scale-free degree distribution P{k) ~ k~'^ with non univer- 
sal 7 exponents. The scale-free degree distribution strongly affects the local 
topology of the networks. For example, scale-fr ee netw orks wi th an e xponent 
7 < 3 have a very large number of small loops ( Bianc oni20od: lBianco ni2005') . 
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which is a very distinctive feature with respect to Erdos and Reny i (ER) 
networks with finite average connectivity ( Janson200Cll : Marinari2004[ l. In its 
turn, this very pecuhar local struct ure induce many relevant e ffects of the dy- 
namics defined on th ese networks ( Dorogovtsev2002t Leone2002: ■Havlin2000l : 
Pastor-Satorras200l[ ) . 



A special type of network subgraphs are cliques, i.e. fully connected subsets 
of nodes of the network. Cliques are relevant objects for the study of real net- 
works, in fact cliques and overlapping set s of c liques provide relevant insights 
on the community structure of networks (|Diinvi2005; Palla200^ . In random 
Erdos and Renyi g raphs of N no des and linking probability p{N) the expected 
number of cliques ( Janson2000l ) of size c is given by 



N 



p{N) 



c(c-l)/2 



Consequently in the large limit the expected number of small cliques with 
c > 3 is different from zero only when the average degree (k) diverges as 
N ^ oo. Special attention in mathematics literature is given to the maximal 
size of the clique present in a graph G, i.e. its clique number Cmax- The clique 
number is an important characteristic of networks and constitute also a lower 
bound to the coloring number, since in the coloring probl em one is force d to 
color all the nodes of a clique with a different color. In ( Bianconi2006| ). we 



show that scale-free networks with 7 < 3 have many more and larger cliques 
than random Erdos and Renyi networks. 

In this paper we provide the complete derivation of the theoretical expecta- 
tion on the number of cliques in random scale-free networks. We do this by 
evaluating the average number of cliques and its second moment. We found 
the surprising result that cliques of size c > 3 are present also in networks 
with finite average connectivity, i.e. networks with 7 G (2,3]. Moreover we 
can prove that the clique number Cmax of networks with 7 < 3 diverge with 
the network size providing upper and lower bounds for the clique number. 
These bound arise from classical inequalities for probabilities which involve 
the first and the second moment of the number of cliques. These can be com- 
puted in different ensembles of random graphs (IMo11ov199,4 lGoh2nniL The 



main section of this paper would be devoted to the calculation of the aver- 
age number of cliques and its se cond moment in the hidden variable ensem- 
ble flCaldareUi2nn2l : lBoguna2003| ). Subsequently the derivat ion of the av erage 



number of clique is extended to the Molloy Reed ensemble (MoUo^^Q^. The 



same scaling of the number of cliques is found also in this ensemble. Finally 
the conclusions are given. 



2 



1 Hidden variable ensemble 



In this ensemble the prescription to generate a class of scale- free networks with 
exponent 7 is the following: i) assign to each node i of the graph a hidden 
continuous variable qi distributed according a p{q) distribution. Then ii) each 
pair of nodes with hidden variables q,q' are linked with probability r{q,q'). 
When the hidden variable distribution is scale-free p{q) = poq~^ for q G [m, Q] 
and the linking probability is linear in both q and q',i.e. r{q, q') = qq'/{{q)N) 
we obtain a random uncorrelated scale-free network. In this specific case a 
cutoff 



Q ~ { 



A^V7 for 7e(l,2] 

ArV2 fQj, 7 e (2, 3] 



is needed to keep the linking probabihty smaller than one, i.e. Q'^/{{q)N) < 1. 



1.1 Average number of cliques 



A clique C of size c is a set of c distinct nodes C = {ii, . . . ,ic}, each one 
connected with all the others. For each choice of the nodes, the probability 
that they are connected in a clique is 

n r{qi,qj). (2) 



Consequently the average number of cliques of size c is given by the number of 
ways in which we can pick c nodes in the network with n{q) nodes with hidden 
variable qi e {q, q + Aq) multiplied by the probability that each couple of node 
of this set is linked. Since in random scale-free networks we have r{q, q') — 
we can write 



w= E n 




(c-l)n{q) 



(3) 



where N{q) = N p{q) arc the nodes of the network with hidden variable 
Ii ^ {Qi q + Aq) and where the sum is extended to all the sequences {n{q)} sat- 
isfying n^q) = c. Introducing a integral representation of the delta function 
S{J2gn{q) — c) and performing the summation over n{q) we get 
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where we have taken the hmit Aq — > 0. In (4) we have introduced the variable 
9 defined as 

= -7^, (5) 



and we have indicated with () the average over the distribution p{q). Solving 
the integral in (4) by saddle point method one finds 



^ ' V^l/"(r)l ^ ^ 

with f{y) = yc/A'"+(log[l+^'^~^e~^]) and y* fixed by the saddle point equation 

c_ ^ I e'^-^e-y* \ 

~ \ 1 + r-ie-f * / ■ ^ ' 

If we assume that the cutoff of the hidden variable distribution is equal to 



Q — y {q)N{l — e) with an e > 0, the maximal clique size depends on both 

the 7 exponent and on e. The dependence in e reflects the fact that when 
e = the highest degree nodes have a probability to be linked r{q,q') which 
approach one. Considering the definition of y* from Eq. (7) we can see that 
the asymptotic expansion 

TV 

ev* ^ -(r-^) (8) 
c 



is valid until 

< Q'~ \. o^.r < 1' (9) 



N [(^(^-1))]^ ~ (2c- 3)A^ 



i.e. until c < c* - {2NQ^-^y/\ 

Consequently for clique sizes c < c* one has the valid asymptotic expression 
for {N,) 



I2n 



(10) 
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To find an upper bound for tlie clique number (tlie maximal clique size) is a 
bit more involved. We start from the classical inequality 

P{Nc > 0) < (TV,) (11) 



and the expression (6) together with (7) for the average number of cliques 
(Nc). If (Nc) — > in the — > oo hmit, then c fixes an upper bound c for the 
maximal clique size of the network. 

In Appendix A show that the clique number Cmax < c with c satisfy for e = 
the condition 

= 1. (12) 



On the other this expression provide a upper bound also for the case e 7^ 
since in this case c defined in Eq. (12) is still in within the validity of the 
asymptotic expansion (10) and correspond to an expected number of cliques 
Nc ^ as N ^ 00. 

The values of c and c* will depend both on the 7 exponent and on the value of 
e. In fact networks with different values of 7 have different structural cutoffs 

• Networks with 7 > 3 

1 

These networks have a natural cutoff Q — aNt-'^. Considering this cutoff 
when performing the average in equation (12), we find c = 3 in the limit 
N ^ 00. Therefore these networks, as well as the Erdos and Renyi networks, 
have maximal clique size Cmax ^ 3- 

• Networks with 2 < 7 < 3 

These networks have a structural cutoff Q = {1 — e)^J {q)N and for c < c* 
the average number of cliques is given by 

whit ^7,{q) been a constant depending on the power-law exponent 7 and 
on the average connectivity of the graph (q). Moreover the value of c and 
c* defined in equations (9) and (12) depend on the system size N, the 7 
exponent and on e as shown in the Table 1. 

We observe that while for the case e > the asymptotic expansion is 
valid much above the upper bound c, for e = the upper bound and the 
limit of the validity of the asymptotic expansion c* have the same order of 
magnitude, i.e. c* ~ c ~ N^' but we have c> c*. 

• Networks with 1 < 7 < 2 

These networks have a structural cutoff defined as in the case 2 < 7 < 3, i.e. 
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Q = {I — e)y {q)N. Given this expression and the divergence of the average 
degree with the upper cutoff (g) ~ Q^"'', we get that the upper cutoff Q 
scales with the network size N as Q ^ N^^"'. The asymptotic expansion 
gives for the average number of chques of sizes c < c* 




Ari/7e(i_e){c+i-2/7)^ 
c(c - 7) 



(14) 



where ^ is a function depending on the power-law exponent 7 and on 

the lower cutoff m of the distribution. 

The value of c and c* defined in equations (9) and (12) depend on the 

system size N, the 7 exponent and on e. Their scaling is shown in the Table 

1. Also in this range of values of 7 for e > the asymptotic expansion is 

valid much above the upper bound c, while for e = the upper bound and 

the limit of the validity of the asymptotic expansion c* have the same order 

_ j_ _ 
of magnitude, i.e. c* ^ c ^ N'^i , but we have c > c*. 



1.2 Second moment of the average number of cliques 



In order to derive a lower bou nd on the cli que number Cmax we use a classical 
relation of probability theory ( Janson200Cll l. i.e. 



P(iV, > 0) > {Nf/{N!) (15) 

where (N^) is the second moment of the number of cliques of size c in the 
considered random graph ensemble. Consequently if (Nc)"^ / {N^) > K we are 
guaranteed that the typical graph contains cliques of size c with probability 
P{Nc > 0) > K > 0. Thus we proceed in the calculation of the second moment 
of the clique number (A*"^). To do this calculation we count the average number 
of pairs of cliques of size c present in the graph with an overlap of o = 0, . . . , c 
nodes. We use the notation {n{q)} to indicate the number of the nodes with 
hidden variable q belonging to the first clique, {^0(9)} to indicate the number 
of nodes belonging to the overlap and with {n'{q)} to indicate the number of 
nodes belonging to the second clique but not to the overlap. We consider only 
sequences {n(g)}, {n'{q)}, {no{q)} which satisfy J2qn{q) = c, J2qno{q) = o and 
J2q ^'{q) = c — o. With these conditions, and then substituting the conditions 
with delta functions we get 



{N'.) = T. E E E n 

o=0{niq)}{n'{q)}{noiq)} 1 



^N{q)\ f Niq)-n{q) 
n{q) / I n'{q) 



n{q) 
no{q) 
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0=0 



(16) 



where g{q) = (c — l)(n(g) + n'{q)) + (c — o)no{q) and 



+ y'{c — o) + i/^o 



+ 



log [1 + (e"^' + e-y) 9"-^ + e-(f+f°)^2c-o-i 
The saddle point method, gives 



y = y 



flc— 1 



N \ i+(^e-!''+e-!')0c-i+e-(!'+!'°)6>2c-o- 



N 



l+(e-J''+e-!')e=-i+e-(j'+!'°)6l2c- 



Using the asymptotic expansions of these saddle point equations valid for 
c < c* we found 



N 



2(c-o) 



]C— 1 



AT 



]2c-o-l\ \ g2c-o 



[2vr] 



^ c(c — 0)0 



(17) 



Using also for {Nl) the asymptotic expression (10) then we can express the 



ratio as 



<E 

0=0 

c 

<E 



c 



\ / (c-7)^ 



1 



2ttc 



0=0 \ (c - o)2('=-°)o°e°y V2c - o - 7 AT [(g) iV]^"'' (1 - e)"-^ ) \ o{c- o) 
" c^" \ /e(c-_7)(l_-e)(^y / 2nc 

(c- o)2('^-°)o°e°y V c(c-7) y \o{c-o)' 



0=0 \ v-- • - / \ c(c-7) 
We notice that in the limit c ^ 00 we have 



(18) 



< c"- 



1 



e°{c - 0)"-° e° (1 - -) 



(19) 



Using this limit behavior and Stirling approximations for factorials, we get 



7 



< 



1 + 



c(c-7)(l-e)(^-'^)e 



c(c - 7) 

If e = It is useful to define the clique size c satisfying 

c(c — 7)e 1 
c(c — 7) c 

i.e. c ~ c^*^^. Then if c = ac^~^ we have in the limit A?^ — > 00, c — > 00, 



(20) 



(21) 



< i 



1 if 77 > 
e if 77 = 
00 if 77 < 0. 



From Eqs. (10) and (22) for c = c defined in (21) one find that with 

l-r, 



c — ac 



P{N, > 0) > 



1 if 7/ > 
i if 7/ = 
if 77 < 



Consequently the network contains almost surely cliques of sizes c < c with 

(22) 



c — ac — a'c^^^ 



and a > 

If e > 0, and c — c — ac^ with 77 > 0, we have in the limit N — > 00, c 00, 



{Nil 



< < 



1 if 77 > 
00 if 77 < 0. 



Prom Eqs. (10) and (23) it follows that as long as c = c — ac^ 

P{K > 0) > < 



1 if 77 > 
if 77 = 0, a = 



Consequently we have that the network contains almost surely cliques of 
sizes c < c with 



c — c — ac^ 



(23) 



and a, 77 > 





e = 




7 > 3 


Cmax — 3 


2 < 7 < 3 


3-7 

c ^ c = N 4 
c = a c ' witn a > U 


Q ^ Cmax ^ C 

c = c — acf with a, > 


1< 7 < 2 


C ^ Cmax ^ C 

c ~ iV^ 
c = a'c^/^ with a' > 


C ^ Cfnax ^ C 

c = c — acf with a, ?7 > 



Table 1 



Since if a graph contains a cUquc of size c it contains clearly also cliques of 
smaller size we proved that typical networks have a finite probability to get 
any cliques of size c < c. 



1.3 Average number of cliques passing through a node 



To find the expected number of cliques of size c passing through a given 
node, with hidden variable q, we can repeat the arguments proposed for the 
calculation of the first moment with the difference that we integrate over all 
the hidden variables of the nodes in the cliques except for the hidden variable 
= q of the chosen node. Following these arguments one finds for cliques 
c < c* 



-1 



A^c(q) ^ ( 1 A^c-i. (24) 



Consequently nodes with higher hidden variable q are expected to be part of 
more cliques. 
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2 Molloy Reed ensembles 



The counting of the number of chques in the Molloy-Reed (|Mo11ov199,'1[ ) fol- 



lows a procedure much similar to the one considered for the hidden variable 
ensemble giving similar results. To construct a Molloy-Reed network one pro- 
ceed as follows: i) a degree is assigned to each node of the network following 
the desired degree distribution with cutoff K 



K 



Ari/7 for 7 G (1, 2] 
A^V2 for 7 e (2, 3] 
N~ for 7 G (3, oo) 



Degree distributions which do not satisfy the parity of {k)N = J^i^i are 
disregarded; ii) the edges coming out of the nodes are randomly matched 
until all edges are connected. The structural cuto ff for 7 < 3 en sures that the 



probability of double links and tadpoles is small ( Bianconi2005l ) . 



To calculate (Nc) in this ensemble first one has to count in how many ways 
it is possible to have a clique of size c in the network and weight the results 
with the fraction of possible networks in the ensemble which contains the 
clique. Let us first state that the total number of graphs in the Molloy-Reed 
ensemble is given by {{k)N — 1)!!. Indeed when constructing the network by 
linking {k)N unconnected edges one start by taking one edge at random and 
connecting it to one of the {{k)N — 1) possible connections. Then one proceed 
taking another edge and linking it to one of the remaining {{k)N — 3) possible 
connections thus giving rise of one of the {{k)N — 1)!! possible networks. By 
similar arguments one shows that the total number of networks containing a 
given clique of size c are [{k)N — c{c — 1) — 1]!!. On the other side the total 
number of cliques of size c in the Molloy-Reed ensemble is given by the number 
of ways one can choose c nodes {Ij, ... ,ic} of connectivity {ki, ^2, • • • , ^c} and 
connect each pair of them. The number of ways one can choose the edges 
coming out of the nodes to form the clique is given by 

n'^ — 



Consequently the average number of cliques in the Molloy-Reed ensemble will 
be given by 

^ JL ( N(k)\ ( fcl Y" 
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where N{k) = NP{k) {n{k)) is the number of nodes with connectivity k 
present in the network (loop), K is the cutoff of the degree distribution and 
the sum over {n{k)} is restricted to {n{k)} such that ^^^^(A;) = c. Moreover 
we use the definition Wn,c = i{k)N - c(c - 1) - iy.\/{{k)N - 1)!!. If we use 
the Stirling approximation for Wn,c we get the expression 

Wn,c ~ ((A;)7V)-^(^-^)/2giVs(c.) (26) 



with u; — c{c — 1)/N and 

<,(a,) = -(W-2.)logfii^j+.~— (27) 



Thus we get 



^c=E n 

{uk} k=c-l 



n{k) 



where 



{k-c+l)\{{k)Nf'^-^)/^ ^ ' 



Expression (28) for the average number of cliques in a MoUoy Reed ensemble 
differs from the equivalent expression in the hidden variable ensemble 4 i) for 
the substitution 9^~^ — >• k'^'^; ii) for the factor exp(iVgf(a;)) and Hi) for the fact 
that the average is performed only on the nodes with connectivity k > c — 1. 
Following the same steps as in the hidden variable ensemble, we get 

N^= J ^e-J'+^(i°g[i+«^"'^"'])e-i+^^('^) (30) 



with g{uj) given by Eq. (27) and the average performed of the N{k) distribution 
with a lower cutoff at k — c — 1. 

Evaluating (30) by the saddle point method and following the steps described 
in the preceding section, we get the following approximate expression for the 
average number of cliques Nc 

iV.= (£<^^V (31) 
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where this approximation is vahd asymptotically for cliques of sizes c < c* We 
note that c* is fixed by the condition 



Similar results to the one found for the hidden-variable ensemble also apply 
for the second-moment of the number of cliques in the MoUoy-Reed ensemble. 



3 Conclusions 

In conclusion we have have calculated the first and the second moment of the 
number of cliques in random scale-free network ensembles. This calculation 
show these networks, provided that the power-law exponent 7 < 3 have many 
small cliques and a large clique number. In particular the clique number di- 
verges with the network size as long as 7 < 3 which is a surprising results since 
in Erdos and Renyi random networks with finite average degree the maximal 
clique size is CmCtx = 3. Moreover we have shown that in the case in which the 
cutoff is the maximal allowed cutoff (i.e. following the terminology of the pa- 
per when e = 0) there can be large fluctuations of the clique number wherever 
for e 7^ the fluctuations are small. 



A Calculation of the upper bound for the clique number in the 
case e = in the hidden variable ensemble 

The evaluation of the upper bound for the clique number in the subtle case 
e = deserve a particular attention. To address this problem we start by 
rewriting in the following the main results for the average number of cliques 
in the hidden variable ensemble. The expression (6) for the average number 
of cliques is given by 




(32) 




(A.l) 



where y* is provided by the saddle point equation (7) 




(A.2) 
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and f{y*) = y*c + N{\og[l + e^-'^e'^*] while 9 is given by 



e = . (A.3) 



If y* > then we have that 

y*c + A^(log[l + e'-^e-y*]) < y*c + 7V(log[l + r"^]). (A.4) 

On one side, from the saddle point equation Eq. (^.2) we have 

e" < (^\\ (A.5) 

on the other side we have that, 

7V(log[l + r-^]) < 2N ^ ^_l^]ll~l-r ) = (^"6) 
Moreover the second derivative f"{y*) satisfy 



Nf{y*) = N> 



e-s/*^c-i \ I 



(1 + e-y*e''-'^) / 1 + e-y* 



>|. (A.7) 

Consequently, putting together Eqs. {AA), {A.5) {A.6) and finally Eq. {A.7) 
the average number of cliques (Nc) {A.l) satisfy 



which together with the inequality (11) provides the upper bound (12) for the 
clique number scales with the system size as shown in Table 1. 

At this point we must check self-consistently that indeed is c < c then y* > 0. 
To prove this we suppose on the contrary that y* < 0. In this eventuality, the 
saddle point equation can be rewritten as 
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_ a* 

where q — Qe'^^ < Q. Expanding the two terms in series we get 

^ = e-^*^iV^(F,,e + G,,,) (A.IO) 



with 



c 



7 



Therefore for c ^ 1 we have 



Lets observe that c in given by the value in the table 1 always satisfy c <^ N^^'^ 
for 7 > 1. Moreover as long as c — > oo with c <S N^^'^, we get form expression 
(??) that y* 0"*". Consequently we assuming |/* > for c > c we have 
reached a contradiction. This proves that in the hypothesis c > c the saddle 
point solution y* is always positive, i.e. y* > as we assume at the beginning 
of the paragraph. 
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